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1. Introduction 

The distribution of a discrete random variable is referred to as 
a distribution with a power-law tail if it falls as 



p(k)~k-v 



(1) 



for k G N and k > k m i a - Power-laws are ubiquitous distribu- 
tions that can be found in many systems from different disci- 
plines, see lilyjl and references therein for some examples. 

Experimental data of quantities that follow a power-law are 
usually very noisy; and therefore obtaining reliable estimates 
for the exponent y is notoriously difficult. Estimates that are 
based on graphical methods are certainly used most often in 
practice. But simple graphical methods are intrinsically unre- 
liable and not able to establish a reliable estimate of the expo- 
nent y. 

For that reason, the authors of 0] introduced an alternative 
approach based on a maximum likelihood estimator for the 
exponent y. Unfortunately the authors concentrate on a rather 
idealized type of power-law distributions, namely 



p\{k;r) 



k-y 



(2) 



with k G N, where the normalization constant £(y, 1) is given 
by the Hurwitz-£-function which is defined for y > 1 and a > 
Oby 



C(7,«) = I 



1 



r ) ('+«) ) 



(3) 



The distribution is characterized by one parameter only, 
and therefore all properties of this distribution (e. g. its mean) 
are determined solely by the exponent y. In many applications 
the power-law (0 is too restrictive. 

If one states that a quantity follows a power-law, then this 
means usually that the tail {k > fc mul ) of the distribution p(k) 
falls proportionally to k~Y. Probabilities p(k) for k < A: m j n may 
differ from the power-law and admit the possibility to tune 
the mean or other characteristics independently of y. In some 
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situations probabilities p(k) may differ from a power-law for 
k > &max as well, e. g. the distribution may have an exponential 
cut-off. 

Therefore, I will generalize the maximum likelihood ap- 
proach introduced in 1 3] to distributions that follow a power- 
law within a certain range A: m ; n < k < A: max but differ from a 
power-law outside this range in an arbitrary way. Furthermore, 
I will give some statements about the large sample properties 
of the estimate of the power-law exponent and present a nu- 
merical procedure to identify the power-law regime of the dis- 
tribution p(k). But first, let us see what is wrong with popular 
graphical methods. 



2. Trouble with graphical methods 

All graphical methods for estimating power-law exponents are 
based on a linear least squares fit of some empirical data points 
(xi,yi), {x2,yi),- ■ ■ , (xm^m) to the function 

y(x)=a + a l x. (4) 

The linear least squares fit minimizes the residual 



M 



A= £(j;-ao-«i*i\ 



(5) 



Estimates arj and a\ of the parameters «o and a\ are given by 

a 

, (E£i*) (E£i*?)-(E£i*) (!,";. VM7) 
a o = —r, — —r, — -5 Co) 



and 



a\ = 



(7) 



The ansatz for the residual Q and derivation of (|6j and 
are based on several assumptions regarding the data points 
(xi,yi). It is assumed that there are no statistical uncertain- 
ties in xi, but yt may contain some statistical error. The errors 
in different yi are independent identically distributed random 
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variables with mean zero. In particular the standard devia- 
tion of the error is independent of jc,\ For various graphical 
methods for the estimation of the exponent of a power-law 
distribution these conditions are not met, leading to the poor 
performance of these methods. 

To illustrate the failure of graphical methods by a computer 
experiment N = 10000 random numbers m, had been drawn 
from distribution (0 with 7=2.5 and an estimate y for the ex- 
ponent y was determined by various graphical methods. The 
estimator is a random variable and its distribution depends on 
the method that has been used to obtain the estimate. Impor- 
tant measures of the quality of an estimator are its mean and 
its standard deviation. If the mean of the estimator equals the 
true exponent y then the estimator is unbiased and estimators 
with a distribution that is concentrated around y are desirable. 
For each graphical method a histogram of the distribution of 
the estimator was calculated to rate the quality of the estimator 
by repeating the numerical experiment 500 times. 

The most straight forward (and most unreliable) graphical 
approach is based on a plot of the empirical probability dis- 
tribution p(k) on a double-logarithmic scale. Introducing the 
indicator function ![■], which is one if the statement in the 
brackets is true and else zero, the empirical probability distri- 
bution is given by 



m 



i n 
-vn 



= *]. 



(8) 



An estimate y for the power-law exponent y is established by 
a least squares fit to 

(xi,yi) = (lnJt,ln p(k)) for all k G N with p(k) > 0, (9) 

y equals the estimate Q for the slope, see Figure^- Because 
the lack of data points in the tail of the empirical distribution 
this procedure underestimates systematically the exponent y, 
see Table H] 

There are two ways to deal with the sparseness in the tail of 
the empirical distribution, logarithmic binning and consider- 
ing the empirical cumulative distribution P(k) instead of p(k). 
The cumulative probability distribution of is defined by 



SC(y,i) 



(10) 



If p(k) has a power-law tail with exponent y then P(k) follows 
approximately a power-law with exponent y — 1 because for 
k ^> 1 the distribution P(k) can be approximated by 



P(k) 



-r 



■di 



jfci-r 



(11) 



, k c(r,i) (y-i)C(y.i)' 

The empirical cumulative probability distribution is given by 



= ^gl[«i>*] 



(12) 



It is less sensitive to the noise in the tail of the distribution and 
therefore a fit of 

(xi,yi) = (]nk,]nP(k)) for all k G N with P(k) > (13) 



Table I: Mean and standard deviation of the distribution of the esti- 
mate for the power-law exponent y for various methods. All methods 
have been applied to the same data sets of random numbers from 
distribution |5J with y = 2.5. See text for details. 



method 


mean 
estimate 


standard deviation 
of estimate 


fit on empirical distribution 


1.597 


0.167 


fit on cumulative empirical distri- 
bution 


2.395 


0.304 


fit on empirical distribution with 
logarithmic binning" 


2.397 


0.080 


fit on cumulative empirical distri- 
bution with logarithmic binning 


2.544 


0.127 


maximum likelihood 


2.500 


0.016 



"In 01 a similar experiment is reported. For a fit of the logarithmically 
binned empirical probability distribution the authors find a systematical bias 
of 29 %. I cannot reconstruct such a strong bias, instead I get a bias of 5 % 
only. Probably the quality of this method depends on the details of the binning 
procedure. 



to a straight line gives much better estimates for the exponent, 
see Figure[2b. But there is still a small bias to too small values 
and the distribution of this estimate is rather broad, see Table|I] 
Logarithmic binning reduces the noise in the tail of the em- 
pirical distributions p(k) and P(k) by merging data points into 
groups. By introducing the logarithmically scaled boundaries 



round c' with some c > 1 



(14) 



(The function roundx rounds x to the nearest integer.) a linear 
least squares fit is performed to 



(xuyi) 



In 



bt+bi 



-l 



or 



(xuyi) = In- 



b i+1 -l 



bi+i-l 

,m £ 

k=bj 



>ln £ 



P{k) 
bi+i — bi 



P(k) 



tt, bi+i-bi 



(15) 



(16) 



respectively. As a consequence of the binning the width of 
the distribution of the estimate y of the power-law exponent 
y is reduced, see Figure \l^d and Table |l] According to 
the numerical experiments a fit of the logarithmically binned 
cumulative distribution gives the best results among graphical 
methods. It shows the smallest systematic bias. 

All the methods that have been considered so far have a 
common weakness. In the deviation of and Q it was as- 
sumed that the standard deviation of the distribution of the 
error in yi is the same for all data points (xi,yi). But this is 
obviously not the case. For fixed k the empirical distribution 
p(k) is a random variable with mean pi(k;y) and standard 
deviation yf pi (k; y) (1 — pi (k; y)) /N. For the corresponding 
data on a logarithmic scale the standard deviation is approxi- 
mately given by the quotient 



^ Px {k-y)(l-p x {k-y))/N _ l-pi(k; Y ) 



pi(fcy) 



Npi(k;y) 



(17) 
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fit on the distribution p(k) fit on the cumulative distribution P(k) = Y.i>k P(J) 




Figure 1 : Comparison of various methods for estimating the exponent of a power-law. Each figure shows data for a single data set of N = 10000 
samples drawn from distribution |2l with y = 2.5. Insets present histograms of estimates for yfor 500 different data sets. 



A power-law distribution p\(k;y) is a monotonically decreas- 
ing function and therefore dl7> is an increasing function of k. 
Because the variation of the statistical error is not taken into 
account, the distribution of the estimate y is very broad. 

Methods that deal with the cumulative distribution have an 
additional weakness. Cumulation has the side-effect that the 
statistical errors in y, are not independent any more, which 
violates another assumption of the deviation of l|6) and |7J. 

To sum up, estimates of exponents of power-law distribu- 
tions based on a linear least squares fit are intrinsically inac- 
curate and lack a sound mathematical justification. 



3. Maximum likelihood estimators 

Maximum likelihood estimators offer a solid alternative to 
graphical methods. Let p(k;0) denote a single parameter prob- 
ability distribution. The maximum likelihood estimator On for 
the unknown parameter based on a sample m i , mi , . . . , mn of 



size N is given by 

N = argmax[L(0)] = argmax[lnL(0)] , (18) 
e e 

where 

N 

L{e)=Y\p{ mi ;e) (19) 

1=1 

denotes the likelihood function. In the limit of asymptotically 
large samples and under some regularity conditions maximum 
likelihood estimators share some desirable features 

• The estimator On exists and is unique. 

• The estimator On is consistent, that means for every e > 

]im¥\\G N -9\ <el =1, (20) 

where IP [| 9n — 9 1 < e] denotes the probability that the dif- 
ference | On — 1 is less than e. 
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• The estimator On is asymptotically normal with mean 
and variance 



NE 



-i 



(21) 



where E [•] indicates the expectation value of the quantity 
in the brackets. 

• Maximum likelihood estimators have asymptotically mini- 
mal variance among all asymptotically unbiased estimators. 
One says, they are asymptotically efficient. 



4. Maximum likelihood estimators for 
genuine power-laws 




The most general discrete genuine power-law distribution has 
a lower as well as an upper bound and is given by 



(k;r) = 



(22) 



for k G N with k 
tion 



C ( T> ^min ) ^max ) 

m <t< &max- Where the non-standard nota- 



C(Y,k 



im^max) : — C(7'^min) C(7i^max) 



(23) 



has been introduced. If the upper bound is missing the distri- 
bution 



k~? 



C(7^min) 



(24) 



has to be considered for k £ N with k > k m i n . The distributions 
i22i and i24l are generalizations of (|2ji and will be useful for 
the analysis of more general distributions that show a power- 
law behavior only in a certain range but have an arbitrary pro- 
file outside the power-law regime. This kind of distributions 
will be considered in section[5] 

The maximum likelihood estimator Yn for the parameter 7 
of the distribution (I22> follows from Jl 81 and is given by 



Yn = argmax 
y 



7 ( Xjlnm, j -AHnC(y, 

^mini &max) 



or equivalently by the implicit equation 



COM 



in ) ^max ) 



1 N 



(25) 



(26) 



which has to be solved numerically. The prime denotes the 
derivative with respect to 7. The asymptotic variance of this 
estimator Yn follows from 12 H and equals 



(Ayv) 2 4 



C(r,* 



nun j "-max 



C"(7 1 "-mm ? ^-max 



) C ( Yi ^min 1 ^max ) C ( Yi ^min j ^rnax ) ^ 



(27) 



Figure 2: Asymptotic standard deviation for maximum likelihood esti- 
mators for the exponent of a power-law distribution l24l . 
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Figure 3: Empirical distribution of the maximum likelihood estima- 
tor (histogram) versus its theoretical asymptotic distribution, which is 
given by a normal distribution with mean y = 2.5 and variance 1271 . 
The histogram has been obtained from the same data as in FigureQ 



In the limit k max — > °o equations tZ5\ . ( I26> . and J27l > give the 
maximum likelihood estimator and the asymptotic variance 
of this estimator for power-law distributions lacking an upper 
cut-off (124b . A graphical representation of the standard devia- 
tion J27l > in the limit k max — > °° is given in Figure|2] For each 
fixed k m i„ the quantity AynVN grows faster than linear with 7. 
Therefore the larger the exponent 7 the larger the sample size 
that is necessary to get an estimate within a given error bound. 

If the maximum likelihood method is applied (assuming 
a distribution l !24» to the same data as in section [2] numeri- 
cal experiments show that the estimates for the exponent are 
much more precise. The estimate has no identifiable system- 
atic bias, the standard deviation of the distribution of the esti- 
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mate is smaller by an order of magnitude compared to graphi- 
cal methods, see TableQJand Figure[3] 



5. Maximum likelihood method for 
general power-law distributions 

The maximum likelihood procedure outlined in section |4] can 
be generalized further to distributions p(k) that are no pure 
power-laws i22\ or i24\ but follow a power-law within a cer- 
tain finite range or follow a power-law in the whole tail of the 
distribution and have an arbitrary profile outside the power- 
law regime. The popurse of this section is to establish meth- 
ods for identifying the power-law regime and for estimating 
the exponent of the power-law regime without making special 
assumptions about the profile of the probability distribution 
beyond the power-law regime. 

The main problem for a generalization of the maximum 
likelihood approach is that there might be no good hypothe- 
sis for the profile of the probability distribution beyond the 
power-law regime. To overcome this difficulty the empirical 
data set is restricted to a window fc cm in m i < femax- (The 
following discussion covers the case of a power-law tail distri- 
butions as well by setting * cmax = °°.) Assuming that p(k) has 
a power-law profile for * cm j n < k < k cmax then the probabil- 
ity distribution of the restricted data set is given by ( 1221 with 
knm = fccmin and k max = k cmax (or by {24) with k min = k cmin ) 
and some unknown exponent y. This allows to estimate the 
power-law exponent by the application of the maximum like- 
lihood method on the restricted data set of size N' as presented 
in section|4]vwf/zoi<f making a hypothesis about the profile of 
the probability distribution beyond the power-law regime. 

In order to apply the maximum likelihood method one has 
to determine the cut-off points * C min and * cmax first. Here it 
has to be taken into account that if the window * cm j n < m,- < 
&cmax is chosen too large the estimate y is systematically bi- 
ased, but on the other hand if it is too small the statistical error 
is larger than necessary. In some cases one can make conser- 
vative estimates for * C min and A: cmax by plotting the empirical 
probability distribution (|8j on a double-logarithmic scale. An 
appropriate window can also be found by determining esti- 
mates %/i (& cm j n ) as a function of the window and a ^ 2 -test. 

Assuming the empirical data is drawn from a distribution 
with a power-law tail (no upper cut-off) the lower cut-off point 
&cmin can be determined in the following systematic way. By 
varying the parameter fc C min the maximum likelihood approach 
gives a sequence of estimates jy (fc C min)- If ^cmin is ver y large 
the estimate will be quite inaccurate because only a tiny frac- 
tion of the experimental data is taken into account; but the 
smaller the cut-off * C min the more accurate the estimate of the 
exponent. If * C min approaches the point from above (but is 
still above) where the probability distribution starts do differ 
from a power-law jy (fc C min) w ih gi ye a very precise estimate 
for the exponent y. On the other hand, if * C min is too small the 
hypothesis that the (restricted) empirical data is drawn from a 
power-law distribution is violated which causes a significant 
change of the estimate of the power-law exponent. 



If the empirical data is drawn from a distribution having 
both a lower crossover point as well as an upper crossover 
point a sequence of estimates jy (fc C min) is determined by re- 
stricting the data to a sliding window £ C min < nii < w£ cm in = 
^cmax with w > 1. As long as the window lies completely 
within the power-law regime the maximum likelihood esti- 
mate obtained from the restricted data set will give a reliable 
estimate of the power-law exponent. If the window lies at least 
partly outside the power-law regime the estimate is systemati- 
cally biased. 

To illustrate the procedures outlined above I generated two 
data sets from two distributions having a power-law regime. 
The first data set of N = 10000 samples was drawn from a 
distribution with a power-law tail which is given by 



-2.5 



P (k) 



for 1< k < 5 



k- 2 - 5 for k > 5 . 



(28) 



Plotting the sequence of estimates jy(fc C min) against the pa- 
rameter fccmin reveals the exponent y = 2.5 as wells as the 
crossover point k = 5 very clearly, see Figure |4] The second 
data set of = 1 00 000 samples was drawn from a distribution 
with two crossover points, viz. 



p(k) ~ ^ 



5~ 2 - 25 forl<fc<5 
k- 225 for5<£<100 (29) 

[lOO- 2 - 25 e-° 05 ( A - IO °) for* > 100. 



Figure |5] shows the sequence of estimates %/>(k C mm) that had 
been determined from restricted data sets of samples within 
the sliding window * cm j n < m, < 5* cm j n . This sequence ex- 
hibits a broad plateau that corresponds to the power-law expo- 
nent y = 2.25. If the window does not lie completely inside 
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Figure 4: Sequence of estimates $y(&cmin) as a function of the cut- 
off *cmin for a data set of N = 10000 samples from distribution l28l . 
Filled symbols mark where the ^ 2 -test has rejected the hypothesis 
that the restricted data follows a power4aw |24] with exponent y = 
7A"(*cmin)- An error probability of a = 0.001 was chosen. 
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"c min 



Figure 5: Sequence of estimates YN'{k C mm) as a function of the lower 
cut-off fc cm i n for a data set of N = 100000 samples from distribution 
1251 . Filled symbols mark where the ^ 2 -test has rejected the hypoth- 
esis that the restricted data follows a power-law |22j with exponent 
7= f/v'(^cmin)- An error probability of a = 0.001 and the window 
width w = 5 had been chosen. 

the power-law regime the estimate y N i {k cm \ n ) deviates system- 
atically from the known exponent. 

Apart from a visual inspection of the %i(k Qm [ n ) plot the 
crossover point(s) to the power-law regime can be determined 
by means of a ^ 2 -test. To apply a ^ 2 -test the data set has to be 
divided into some bins and the following binning turned out 
to be appropriate: The data is partitioned into a small number 
b, say b = 6, of bins. In the case of a distribution with a power- 
law tail this means each bin j collects nj items such that 

fl\ =N'p(k cmin ) ^1 =W cmin (fccmin;7A''(^cmm)) (30) 

«2 = N'p (k c min + 1 ) qi = Pkcrin (&cmin + 1 \ ?N> (^c min ) ) 

(31) 



If the to k cm i n < nii < £ cm ax restricted data is given by the 
power-law O or O with k min = £ cmin , fc max = £ cmax , 
and the exponent y = $y'(fc cm i n ) then the statistic c 2 follows 
asymptotically a ^-distribution with v = (b — 1) degrees of 
freedom, which is given by 

x v/2-l e -.v/2 

M'.v) = 2 _ v/2r(y/2) - (34) 

Let Xa be the (1 — a)-quantile of the distribution i34l . The hy- 
pothesis that the restricted data is given by the power-law i22\ 
or (ED , respectively, with k m m = k cm { n , k max = k cmax , and the 
exponent y = f N i (£ C mi n ) is accepted with the error probability 
a if c 2 < Xa- ^ the window £ cmm < m, < A: cmax lies not com- 
pletely within the power-law regime this hypothesis will be 
rejected by the ^ 2 -test and one can detect the upper crossover 
point as well as the lower crossover point (where the power- 
law loses its validity) in a reliable way, see Figure |4] and Fig- 
ure |3 



6. Computational remarks 

The normalizing factors of the probability distributions ( 1221 
and (1241 are given by the Hurwitz-^-function. This func- 
tion is less common than other special functions and may not 
be available in the reader's favorite statistical software pack- 
age but the GNU Scientific Library |7| offers an open source 
implementation of this function. A direct calculation of the 
Hurwitz-£-function by truncating the sum Q gives unsatis- 
factory results. 

The maximum likelihood estimator of the exponent can be 
computed numerically either by solving ( I25t or ( I26l l. Equation 
d25l > has the advantage that it can be solved without calculating 
derivatives of the Hurwitz- ^-function 01, whereas the solu- 
tion of i26\ involves its first derivative (e. g. bisection method) 
or even higher derivatives (e.g. Newton-Raphson method). 
An explicit implementation of these derivatives is often not 
available but may be calculated numerically. 



and finally 

n b =N' £ p(k) q b = £ Pk cmiD {k;?N' (fccmin)), 
k=k cm i n +b-l k=k cmin +b-l 

(32) 

where qj denotes the probability that a data point falls into 
bin j under the assumption that the (restricted) data follows 
the power-law (1241 with k m i n = ^cmin and the exponent y = 
yy(^cmin)- For distributions with a finite power-law regime 
the binning procedure can be carried out in a similar way. In 
this case the summation index in d32t is bounded by £ C min + 
b - 1 < k < £ C max and the probability p kcmm ,k cm ^ (k\ %> Ocmin)) 
has to be considered instead of pk cmia {k\ Yn' (^cmin))- 
The test statistic of the % 2 -test is given by 

, * (nj-N'qA 2 



7. Conclusion 

Methods based on a least squares fit are not suited to establish 
estimates for power-law distribution exponents because least 
squares fits rely on assumptions about the data set that are 
not fulfilled by empirical data from power-law distributions. 
In this paper maximum likelihood estimators have been intro- 
duced as a reliable alternative to graphical methods. These es- 
timators are asymptotically efficient and can be applied to data 
from a wide class of distributions having a power-law regime. 
The crossover points that separate the power-law regime from 
the rest of the distribution can be determined by a procedure 
based on a £ 2 -test. 

Finally I would like to mention that the idea to plot a se- 
quence of estimates %/' (^cmiql as shown in Figure 0] is re- 
lated to so-called Hill plots [Toll . The Hill estimator is 



7 



a maximum likelihood estimator for the inverse of the ex- 
ponent of the continuous Pareto distribution p(k) = (y — 
l)(£Amin)~ r Amin, see fi^l f° r a detailed discussion. 
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